Molecular Ecology — Latest Matching Preprints

1

Pigeon-Guano-Contaminated Environments in Blantyre, Southern Malawi, are Reservoirs of Medically Important Fungi

Merico, B. J.; Chigwechokha, P.; Alubino, P.; Bandawe, G. P.

2026-05-30 occupational and environmental health 10.64898/2026.05.26.26354139 medRxiv

Top 6%

0.3%

Show abstract

Close to 50% of all bird species are reservoirs of potentially pathogenic fungi, including those listed as priority by the World Health Organization. In Malawi, data on diversity, pathogenic potential, and ecological avian sources of medically important yeast are scarce. A cross-sectional study using a descriptive approach was conducted in Blantyre, Southern Malawi, to characterise medically important yeasts recovered from environments contaminated with excreta/guano from synanthropic pigeons. A total of 20 samples were collected from 4 peri-urban areas, which yielded 71 yeast isolates. To assess the pathogenic potential of the environmental isolates, we compared their phenotypic virulence traits with those of 21 clinical yeast isolates collected from referral hospital laboratories. Pichia kudriavzevii (39%) and Candida orthopsilosis (30%) were the commonly isolated species in the pigeon-guano-contaminated environments. Candida parapsilosis sensu stricto (29%) and Candida albicans (24%) constituted most of the clinical yeast isolates. Half of the species isolated in the pigeon-guano-contaminated environments were also identified among the clinical isolates. A majority of the environmental isolates showed virulence traits similar to or stronger than clinical isolates. The findings underscore the critical need for integrated surveillance under the One Health framework, especially in bird-inhabited spaces close to human settlements.

2

Contributions of immune cell biomarkers to explaining differences in mortality risk by sex in the Health and Retirement Study

Yin, M. A.; Nguyen, V.; Nathan, A.; Patel, C.

2026-05-29 epidemiology 10.64898/2026.05.27.26354256 medRxiv

Top 6%

0.3%

Show abstract

Background: It is well-established that males have a higher mortality risk than females. Immune cells and their function are known to undergo characteristic changes during aging, and immune cells are known to have sex differences. Immune cells and their function have been linked to mortality risk, but no studies have investigated to what degree, if at all, Immune Cell Biomarkers (ICBs) contribute to the known differences in mortality risk by sex. Methods: Using participant data from the Health and Retirement Study (n = 8,822), we applied multivariable linear regressions adjusting for age, cytomegalovirus (CMV) serostatus, sex, and race/ethnicity to identify differences by sex in 48 immune cell biomarker (ICB, e.g. T cells, B cells, Monocytes, etc.) percentages and counts (measured in 2016). We studied how the associations between ICBs and mortality risk differ by sex using stratified Cox Proportional Hazard (CPH) models. We estimated how inclusion of sex explained the relationship between ICBs and all-cause mortality, and conversely, how inclusion of individual and all ICBs combined explain the relationship between sex and all-cause mortality using multivariable modeling approaches. Results: Differences in ICBs by sex range between 2-38% (39/48 statistically significant). 9 ICBs were significantly associated with mortality risk in the entire sample. While different ICBs were significantly associated with mortality risk in the stratified analyses, particularly with respect to monocyte, B cell, and NK cell populations, adjusting for sex modestly influenced the hazard ratios of the ICBs (sex: 8 ICBs, percent change <5.4%). Furthermore, individual and cumulative contributions of ICBs in explaining the differences in mortality risk by sex were not significant.

3

Dentine markers of pre/early postnatal lead exposure links with brain, cognitive, and behavioral outcomes in adolescents

Marshall, A. T.; Kan, E.; Adise, S.; König, M.; McConnell, R.; Martinez, M.; Midya, V.; Arora, M.; Sowell, E. R.

2026-05-27 pediatrics 10.64898/2026.05.26.26354134 medRxiv

Top 7%

0.2%

Show abstract

Lead is a toxic metal ubiquitous in our environment. While dramatic reductions in lead sources have paralleled equivalent decreases in lead-poisoning rates, chronic lead exposure remains a critical public health concern. Childhood lead exposure (at its lowest levels) is liked to changes in cognitive development but less is known about lead's effects on children's brain structure, especially as a result of in utero exposure. We measured prenatal and early-postnatal lead exposure in shed deciduous teeth of 448 9- and 10-year-old children (from 20 United States cities) and linked those lead levels to childhood brain structure, cognition/behavior, and neighborhood- and family-level socioeconomic characteristics. Here we show negative associations between tooth-lead levels and the thickness of the brain's cortex, particularly in regions linked to language processing. With increasing tooth-lead levels, children of lower-income (versus higher-income) families showed steeper declines in receptive vocabulary. Caregiver-reported behavioral problems exhibited similar associations. With in utero exposure linked to adverse neurodevelopmental outcomes (well before lead exposure and its risks are evaluated by healthcare professionals), prenatal screening of maternal lead levels/exposure, coupled with recommended strategies to reduce its placental transmission, may help reduce lead's effects on future generations.

4

Dengue spatiotemporal patterns in Minas Gerais, Brazil, 2014-2023: regional epidemic forces dominate over the environmental impact of the Brumadinho dam collapse

Fernandes, G. d. R.; Vaz, A. B. M.; Fonseca, P. L. C.; Oliveira, W. K.; Aguiar, E. R. G. R.; Lopes, B. C.; Mota-Filho, C. R.; Castro, M. L. P.; Starling, C. E.

2026-05-26 epidemiology 10.64898/2026.05.19.26353615 medRxiv

Top 9%

0.2%

Show abstract

Background: Dengue is a major public health problem in Brazil, and Minas Gerais is one of the states with the highest burden. In January 2019, the Brumadinho dam collapse released about 12 million cubic meters of iron ore tailings into the Paraopeba River basin, causing environmental disturbance that could plausibly affect vector habitats and dengue transmission. We evaluated the spatiotemporal dynamics of dengue in Minas Gerais from 2014 to 2023 and tested whether the disaster was associated with changes in affected municipalities. Methods: We performed an ecological spatiotemporal analysis using dengue notifications from SINAN for all municipalities in Minas Gerais (2014-2023). Municipalities were classified as Paraopeba basin, regional controls, or state controls. Temporal similarity was assessed using Pearson correlation-based hierarchical clustering and non-metric multidimensional scaling (NMDS). Sources of variation were examined with PERMANOVA and principal component analysis (PCA). A linear mixed-effects model with municipality as a random effect was used to test changes after 2019, with pre/post contrasts estimated from marginal means. Results: Dengue showed strong temporal synchrony across the state, with major epidemic peaks in 2015-2016, 2019, and 2023. Health region explained 31.5% of the variation in temporal incidence profiles (p = 0.001), whereas Paraopeba basin status explained no significant variation (p = 0.998). No temporal cluster was enriched for municipalities in the Paraopeba basin. PCA identified 2023, 2019, and 2016 as the main years driving variability. In the mixed model, year was significant (p < 0.001), but Paraopeba basin status and its interaction with time were not. Incidence increased significantly after 2019 in non-exposed municipalities (p < 0.001), but not in basin municipalities (p = 0.088). Conclusions: Dengue dynamics in Minas Gerais were driven mainly by regional and state-wide epidemic processes, with no significant independent effect of the Brumadinho dam collapse on notified dengue patterns.

5

High-dimensional Characterization of Genome-Environment Fitness Landscapes in Klebsiella pneumoniae

Zhou, G.; Williams, G.; Millner, M. T.; AlHirayban, R.; Alosaimi, W.; Fallatah, O.; Hart, A. J.; Malaikah, M.; Iftikhar, S.; Ahmad, H.; Roghanian, M.; Mustonen, V.; AlYami, R.; Banzhaf, M.; Moradigaravand, D.

2026-05-30 genetic and genomic medicine 10.64898/2026.05.28.26354339 medRxiv

Top 9%

0.2%

Show abstract

Background Bacterial fitness is shaped by interactions between genome variation and environmental context, yet how these interactions determine its predictability and heritability remains unclear. In the clinically important pathogens of Klebsiella pneumoniae, a leading cause of hospital-acquired infections, this question is particularly pressing. Despite extensive genomic characterization, we still lack a systematic understanding of how genome-wide variation translates into fitness across diverse environments in K. pneumoniae. Methods We filled this gap by profiling a systematic collection of 1,462 clinical K. pneumoniae isolates across 214 diverse environmental and pharmacological stress conditions using high-throughput chemical genomics. Fitness was quantified from colony growth and integrated with whole-genome sequencing data. Genome-wide association analyses identified genetic determinants of fitness, and machine learning models incorporating genomic features were used to predict fitness.Results Fitness exhibited a strongly environment-dependent genetic architecture, with modest but significant concordance between genetic background and phenotypic variation. Under antibiotic and stress-combination conditions, fitness was driven by discrete, high-effect determinants, including known resistance genes, resulting in stronger signals and improved predictability. In contrast, non-antibiotic environments showed more polygenic and distributed architectures with weaker associations. Genome-wide analyses identified both established and previously uncharacterized genes linked with fitness across conditions. Resistance and virulence determinants exhibited clear context-dependent trade-offs, conferring fitness advantages under selection but imposing costs in non-selective environments. Consistent with this, plasmid carriage showed environment- and genotype-dependent fitness effects, with benefits under antibiotic pressure and measurable costs otherwise. Genomic variant-based models for fitness prediction achieved moderate performance (Mean Spearman correlation ({rho}) = 0.36 (95% CI: 0.18-0.67) for predicted versus observed values in unseen data) across conditions, with improved accuracy under strong antibiotic selective pressures, and produced well-calibrated prediction intervals with high coverage. Despite strong population structure effect on predictions, models captured predictive gene and SNP biomarkers for fitness. Conclusion These findings highlight that bacterial fitness is an emergent property of genome-environment interactions rather than a fixed attribute of genotype. This work establishes a unified high-dimensional genotype-phenotype framework linking genomic variation to fitness across diverse conditions in a major pathogen, with broader implications for other pathogenic bacterial species.

6

Association of Clonal Hematopoiesis with Total and Cause-Specific Mortality Among Older Women

Chang, A.; Ezzat, D.; Uddin, M. M.; Pershad, Y.; Collins, J. M.; Kitzman, J.; Jaiswal, S.; Desai, P.; Shadyab, A.; Anderson, G. L.; Casanova, R.; Wallace, R.; Wactawski-Wende, J.; Bick, A. G.; Natarajan, P.; Kooperberg, C.; LaMonte, M. J.; Whitsel, E. A.; Manson, J. E.; Reiner, A. P.; Honigberg, M. C.

2026-06-01 cardiovascular medicine 10.64898/2026.05.28.26354392 medRxiv

Top 9%

0.1%

Show abstract

Clonal hematopoiesis of indeterminate potential (CHIP) represents the age-related expansion of hematopoietic stem cells with preleukemic mutations. However, its association with all-cause and cause-specific mortality has not been well characterized in older adults. We aimed to evaluate whether CHIP is associated with all-cause and cause-specific mortality in a population of older women in the United States. Our study included 6,704 participants in the Women?s Health Initiative Long Life Study (WHI-LLS) without hematologic malignancy. The co-primary exposures were any CHIP (variant allele frequency [VAF] [≥] 2%) and large CHIP (VAF [≥] 10%), and the primary outcome was all-cause mortality. Multivariable-adjusted Cox proportional hazards models tested the associations of CHIP and CHIP subtypes with all-cause and cause-specific mortality. Any CHIP and large CHIP were independently associated with all-cause mortality, with multivariable-adjusted hazard ratios (aHRs) of 1.12 (95% confidence interval [CI] 1.04-1.21; P = 0.003) and 1.28 (95% CI 1.15-1.43; P < 0.001), respectively. In gene-specific analyses, non-DNMT3A CHIP was associated with all-cause mortality (aHR: 1.22 [95% CI: 1.12-1.34], P < 0.001), while DNMT3A CHIP was not (aHR: 1.07 [95% CI: 0.98-1.18], P = 0.13). Furthermore, large CHIP was associated with cardiovascular (aHR: 1.29 [95% CI: 1.08-1.55], P = 0.006), cancer (aHR: 1.49 [95% CI: 1.11-2.02], P = 0.009), and neurologic (aHR: 1.40 [95% CI: 1.07-1.84], P = 0.02) death. In this cohort of older women, CHIP, particularly large clones and non-DNMT3A CHIP, was associated with all-cause and cause-specific mortality. These findings suggest that clonal size and subtype may differentially influence mortality risk.

7

Sleep Disorders Modify the Age-Related Trajectory of Circadian Rest-Activity Rhythms: Evidence from NHANES 2011--2012 Wrist Actigraphy

Yin, L.; Lee, C. W.; Wong, A.

2026-06-01 epidemiology 10.64898/2026.05.28.26354369 medRxiv

Top 9%

0.1%

Show abstract

Background: Circadian rest-activity rhythms weaken with age, but whether sleep disorders modify this trajectory is unknown. Methods: We analyzed wrist accelerometry data from 4,386 participants aged 6-80 years in the 2011-2012 National Health and Nutrition Examination Survey (NHANES). Circadian features were extracted using cosinor analysis and nonparametric methods; a Circadian Disruption Index (CDI) was constructed from five standardized components. Survey-weighted regression with natural cubic splines and Wald F-tests tested age-by-sleep-disorder interactions using Taylor series linearization for variance estimation. Results: Doctor-diagnosed sleep disorder (N = 360, 8.2%) was associated with significantly different age-related trajectories of amplitude (F(2,17) = 11.24, p = 0.0008) and MESOR (F(2,17) = 8.22, p = 0.0032), both surviving Bonferroni correction (p < 0.006). CDI was higher in those with a sleep disorder (0.290 vs. 0.131, p < 0.001) and was independently associated with higher BMI (beta = 1.33 kg/m2, p < 0.001), higher HbA1c (beta = 0.089%, p = 0.004), greater diabetes prevalence (beta = 3.8 percentage points, p < 0.001), and worse depressive symptoms (beta = 0.43 PHQ-9 points, p = 0.020). Sensitivity analyses using a broader sleep problem exposure did not replicate these interactions. Conclusions: Doctor-diagnosed sleep disorders are associated with an altered age-related decline in circadian amplitude and mean activity level. CDI was independently linked to cardiometabolic and depressive outcomes, supporting a mechanistic connection between clinically significant sleep pathology and circadian disruption across the lifespan.

8

Why epidemic risk at the 2026 World Cup may not be what you think

Lessler, J.; Smith, C. P.; Das, P.; Sykes, A. L.; Urbinati, A.; Geith, K.; Powers, K. A.; Davis, J. T.; Kern-Allely, S. C.; Vega Yon, G. G.; Lofgren, E. T.; Pearson, C. A. B.; Vespignani, A.

2026-06-01 epidemiology 10.64898/2026.05.28.26354384 medRxiv

Top 10%

0.1%

Show abstract

Background: The 2026 FIFA World Cup may bring over one million visitors to North America from around the globe to participate in mass gathering events. The nature of the event and recent news have raised concerns for some that the tournament could lead to infectious disease outbreaks or fuel existing epidemics. Objective: To systematically assess the infectious disease threat posed to the United States by the tournament. Design: A multi-institutional team evaluated pathogen-specific risk across three dimensions: importation, outbreak potential, and impact to identify a priority pathogen list. A systematic screening protocol ensured common criteria and that pathogen information was collected when necessary to inform inclusion. Results: Increased risk from the World Cup is near zero for 63 of 77 evaluated pathogens. Pathogens were predominantly excluded as threats due to low excess importation risk and low outbreak potential if introduced. The remaining priority pathogens fall into five categories: (a) mosquito borne pathogens with the potential for sustained transmission in some host cities, (b) seasonal respiratory viruses, (c) chronic infections with high prevalence outside the United States, (d) pathogens present in the United States with likely increased transmission at World Cup activities, and (e) high-consequence infectious threats. Limitations: Data availability is variable across diseases. Impact calculations may not reflect actual costs to host cities. Disease incidence in World Cup travelers may differ from national incidence rates. Conclusion: While infectious disease outbreaks at the 2026 FIFA World Cup are possible, in an already highly connected world where large gatherings are frequent, the elevated risk from the tournament is not as extreme as it first may seem.

9

DNA Methylation Signatures of Atherosclerosis and Vascular-Related Outcomes in U.S. and Irish Population-Based Cohorts

Ammous, F.; Smith, T.; Scarlett, S.; Hernandez, B.; McCrory, C.; Kenny, R. A.; Mitchell, C.; Faul, J. D.

2026-05-27 epidemiology 10.64898/2026.05.25.26354072 medRxiv

Top 10%

0.1%

Show abstract

Atherosclerosis is a systemic vascular process linked to cardiovascular, cognitive and renal outcomes. DNA methylation (DNAm)-based scores of atherosclerosis may capture cumulative biological processes underlying vascular aging. Here, we examined associations of DNAm scores for coronary artery calcification (DNAm-CAC) and carotid plaque (DNAm-cPlaque), derived from a large study of imaging-based subclinical atherosclerosis, with prevalent and incident outcomes in two population-based cohorts of older adults: the Health and Retirement Study (HRS; n = 3,875) and The Irish Longitudinal Study on Ageing (TILDA; n = 487). Higher DNAm scores were associated with adverse cardiometabolic profiles and socioeconomic indicators. In HRS, higher DNAm-CAC was associated with prevalent cardiovascular disease (odds ratio per SD, 1.16; 95% confidence interval (CI), 1.07-1.26), lower cognitive function ({beta} = -0.50, 95% CI -0.68 to -0.32) and lower estimated glomerular filtration rate (eGFR; -1.7 ml min-1 1.73 m-2, 95% CI -2.6 to -0.8) in unadjusted models. After adjustment for demographic and clinical risk factors, DNAm-CAC ({beta} = -0.29, 95% CI -0.46 to -0.13) and DNAm-cPlaque ({beta} = -0.24, 95% CI -0.42 to -0.06) remained associated with lower cognitive function, and DNAm-cPlaque was associated with incident cognitive impairment or dementia (hazard ratio per SD, 1.16; 95% CI, 1.01-1.32). Associations were attenuated after further adjustment for race/ethnicity and socioeconomic indicators. In TILDA, higher DNAm-cPlaque was associated with worse cognitive performance (incidence rate ratio, 1.11; 95% CI, 1.01-1.21), increased risk of incident cardiovascular disease (hazard ratio, 1.18; 95% CI, 1.00-1.42) and lower eGFR, with consistent associations observed for DNAm-CAC. These findings suggest that DNAm-based scores of atherosclerosis capture systemic vascular processes linked to multiple age-related outcomes across populations. Further work is needed to clarify the biological pathways reflected by these scores and their relation to cumulative and socially patterned vascular risk.

10

Spatial variation in incidence of meningococcal meningitis: evidence from a large historical epidemic in Glasgow

Stewart, G.; Schroeder, M.; Mancy, R.; Angelopoulos, K.

2026-05-30 epidemiology 10.64898/2026.05.28.26354324 medRxiv

Top 11%

0.1%

Show abstract

Large epidemics of invasive meningococcal disease are rare in temperate regions. Here, we analyse administrative data on the largely forgotten epidemic of bacterial meningococcal meningitis that occurred in Glasgow in 1907, probably the largest on record in the UK. The epidemic, predominantly confined to the city, killed around 1,000 people, had a case fatality rate of nearly 70%, and hit infants and young children the hardest. We show the rapid rise and fall in cases and the spatial distribution of incidence and mortality rates within the city. We find that within-household overcrowding was a key driver of incidence whereas between-household geographic proximity was not. We also find that the spatial distribution of disease risk during the epidemic persisted in the post-epidemic period and during a later outbreak. The findings suggest that interventions should prioritise populations in areas that have experienced higher incidence rates to mitigate the risk of future outbreaks.

11

Inferring Sexual Network Bridging Using Genomics: A Simulation Study

Kline, M. C.; Helekal, D.; Oliveira Roster, K. I.; Grad, Y.

2026-05-26 infectious diseases 10.64898/2026.05.24.26353967 medRxiv

Top 11%

0.1%

Show abstract

The dynamics of sexually transmitted infections involve interconnected transmission networks, including men who have sex with men and heterosexual populations. Understanding the extent of bridging between these networks can inform surveillance, guide interventions, and aid in the interpretation of their impact, but methods for quantifying bridging have been lacking. Here, we addressed whether pathogen genomics tools, successfully used to reconstruct transmission in other contexts, could accurately infer sexual network bridging. Based on simulations of gonorrhea spread, we evaluated phylodynamic bridging metrics inferred by ancestral state reconstruction under a range of sampling schemes, from comprehensive to sparse. These metrics differentiated sexual network structures even with biased sampling schemes, but accuracy depended on the sampling scheme and density: phylodynamic bridging estimates using sequences from all detected infections for one network configuration were on average 6.9% above the true value, whereas estimates from 5% of infections in symptomatic men with many partners were on average >1000% above the true value. These results suggest routine overestimation of bridging from unadjusted inferences from genomics data and provide context for interpreting existing genomic surveillance data and targeted studies.

12

Distinguishing Age-specific Patterns in Comorbidities of Obstructive Sleep Apnea Using Real-World Data

Goodman, M. O.; Alex, R. M.; Sands, S. A.; Azarbarzin, A.; Batool-anwar, S.; Pavlova, M. K.; Epstein, L. J.; Redline, S.; Cade, B. E.

2026-05-28 epidemiology 10.64898/2026.05.20.26352336 medRxiv

Top 12%

0.0%

Show abstract

Obstructive sleep apnea (OSA) is associated with a wide range of comorbidities, but the extent to which these follow predictable, age-dependent patterns is not well understood. Identifying such patterns could provide insight into OSA heterogeneity and its links to physiological measures of OSA. We trained age-dependent topic models (ATM) on longitudinal electronic health records from 36,426 patients with OSA in the Mass General Brigham Biobank. ATM organizes incident diagnoses into distinct comorbidity "topics," whose age-specific disease loadings represent predictive patterns linking related diagnoses across the life course. We applied the trained model to compute individual-level topic scores in independent data: a cohort of 11,689 OSA cases and 22,695 matched controls, and a cohort of 6,220 patients with polysomnography (PSG)-derived physiological measures. We identified 19 distinct age-dependent comorbidity profiles, all significantly associated with OSA case status (FDR-adjusted p<0.05). Topics reflected recognizable clusters including metabolic, neuropsychiatric, and immune-mediated conditions, and several were distinguished by age-of-onset of key comorbidities, such as early- vs late-onset asthma. Seventeen of the 19 topics were significantly associated with at least one of 13 PSG-derived physiological measures, including associations between cardiometabolic topics and the apnea-hypopnea index, sleep apnea specific hypoxic burden, and respiratory event-specific heart rate burden. These findings indicate that age-dependent comorbidity patterns distinguish meaningful OSA subtypes with differing prognoses and endophenotype associations. ATM offers insight into complex OSA comorbidity and suggests that age-informed, topic-based stratification may improve individualized risk assessment, interpretation of PSG findings, and targeting of clinical interventions.

13

High Incidence of Adverse Pregnancy Outcomes are Associated with Maternal Age and Infection Status in a Resource-Limited Community

Kituyi, S. N.; Odongo, A. O.; Wachuka, R.; Wambua, S.; Kobia, F.; Gitaka, J.; Kanoi, B. N.

2026-06-01 epidemiology 10.64898/2026.05.29.26354424 medRxiv

Top 12%

0.0%

Show abstract

Maternal health during pregnancy is critical for favorable birth outcomes and long-term wellbeing of both mothers and infants. Women in rural, malaria-endemic regions face unique biological and socioeconomic challenges that may increase the risk of adverse pregnancy outcomes (APOs). This study investigated the incidence and determinants of APOs among pregnant women attending antenatal care at Webuye sub-County Hospital in Western Kenya, a rural malaria-endemic setting. We conducted a retrospective cohort analysis utilizing previously collected data of 300 women enrolled during early pregnancy and followed through delivery. Maternal demographic, clinical, and infection-related factors were assessed, and associations with APOs were evaluated using chi-square tests and multivariable logistic regression. Maternal age and gestational age at enrollment were significantly associated with malaria history (P<0.001). Maternal BMI abnormality (124.5/1000 pregnancies), anemia (99.3/1000), fetal or neonatal death (81.3/1000), and preterm birth (43.8/1000) were observed (all P<0.001), suggesting a substantial burden. Younger mothers (<20 years) and older mothers (>35 years) were significantly more likely to develop anemia (P =0.026), and prior malaria infection further increased anemia risk (P =0.02). Abnormal urinalysis findings indicative of urinary tract infection were significantly associated with low birthweight (P =0.031). No significant associations were found between APOs and infant sex, parity, gravidity, or maternal ABO blood type. These findings highlight a substantial burden of APOs in this rural population, exceeding national and global estimates. Strengthening malaria prevention, nutritional support, urinary infection screening, and encouraging early antenatal care attendance are critical to improving maternal and neonatal outcomes. Targeted interventions for adolescent and older mothers, along with enhanced point-of-care diagnostics, may reduce preventable complications in similar resource-limited, malaria-endemic settings.

14

Two anti-phase spatial modes and a candidate spatial-persistence regime transition of SARS-CoV-2 in Japan: a 159-week prefecture-level sentinel surveillance study

Nakano, T.; Onozuka, D.; Ikeda, Y.; Washiyama, K.; Takashima, Y.

2026-05-26 epidemiology 10.64898/2026.05.24.26353972 medRxiv

Top 14%

0.0%

Show abstract

Background. On 8 May 2023 the Japanese Ministry of Health, Labour and Welfare reclassified COVID-19 under the Infectious Disease Control Law from a designated infectious disease (with case-by-case reporting requirements comparable to those of a Category-2 disease) to a Category-5 ("Class-5") notifiable disease, joining the same category as seasonal influenza and most other endemic respiratory infections. Under this regime, COVID-19 case counts are reported weekly from a nationwide network of sentinel medical facilities (initially approximately 5,000, reduced to approximately 3,000 following an April 2025 surveillance reform), and individual case reporting is no longer required. We aimed to characterize the spatial topology of COVID-19 epidemics under this sentinel-surveillance regime and to detect, in a data-driven manner, any structural change in epidemic dynamics over this period. Methods. We analyzed weekly per-sentinel-facility COVID-19 case counts in all 47 prefectures of Japan from 2023-W17 to 2026-W19 (159 weeks). For each week we computed the Shannon pseudo-entropy S of the prefecture-share distribution and global, local, and time-lagged Moran's I across a 92-edge contiguity-based adjacency matrix. To identify any structural change in a data-driven manner, we adopted a two-stage approach motivated by an empirical regularity established in Section 3: we first verified the wave-amplitude-invariant entropy ceiling (S_max >= 3.80 in all five pre-transition waves), then restricted change-point detection to the weeks after S(t) last attained this ceiling, applying PELT, CUSUM, and Bai-Perron sup-F within this restricted region. Seasonal structure was characterized by truncated Fourier regression with first-order autoregressive errors (Cochrane-Orcutt) over harmonic orders K = 1 to 6; between-period comparisons used moving block bootstrap as the principal inferential statistic. Results. The five epidemic waves during 2023-2025 followed a stereotyped spatial template in which S(t) traced a characteristic U-shape around each peak, with a wave-amplitude-invariant entropy ceiling reaching on average 99.4% of the theoretical maximum ln 47 (range 3.820-3.836, SD 0.006). The last week in which S(t) attained this entropy ceiling was 2025-W42. Restricting change-point detection to the 29 subsequent weeks, PELT and CUSUM localised the structural break to late 2025: PELT identified 2025-W48 (robust across penalty values >= sigma^2*ln(n) and across entropy-ceiling thresholds 3.78-3.82) and CUSUM peaked at 2025-W50 (p < 0.0001), placing the break within a two-week window centred on late November 2025. Bai-Perron sup-F peaked later at 2026-W02 (p = 0.062, with reduced power on n = 29). We adopted 2025-W48 as the principal change-point, defining 135 pre-transition weeks and 24 post-transition weeks. Two anti-phase spatial modes were identified in the pre-transition record: a summer-onset Okinawa-seeded Kyushu cascade (Mode A; annual peak epi week 26) and a winter-onset Tohoku-centred connected-cluster mode (Mode B; annual peak epi week 51), approximately 25 epi weeks out of phase. After the regime transition, this ceiling was not attained, and the spatial-persistence ratio I(tau = 8 wk)/I(0) shifted from a highly variable distribution centred near 0.27 (pre-transition, 125 weeks) to a tightly clustered distribution around 0.89 (post-transition, 24 weeks); the mean difference was 0.62 (95% bootstrap CI 0.32 to 0.90; moving block bootstrap p < 0.0001 across block lengths 1-12). The principal finding remained significant under autoregressive-augmented null models and was robust to adjacency-matrix choice, the April 2025 surveillance reform, harmonic order K = 1 to 6, and Okinawa exclusion. Conclusions. Data-driven analysis of 159 weeks of Japanese sentinel surveillance identifies a candidate spatial-persistence regime transition emerging in late November 2025, in which the spatial structure of weekly case shares persists for at least 8 weeks rather than dissipating as in pre-transition. The transition coincides with loss of the wave-amplitude-invariant entropy ceiling and with absence of the Mode A signature through the observed post-transition period. The recent uptick in Okinawa case shares (continuing through 2026-W19) leaves open whether the Mode A signature is structurally suppressed or merely deferred; observation through summer 2026 is required to distinguish a sustained shift from a transient anomaly.

15

Can Large Language Models Diagnose Primary Immunodeficiency from Patient-Described Symptoms?

Reteig, L. C.; Woloshin, S.; Maglione, P. J.; Farmer, J. R.; Ong, M.-S.

2026-05-27 allergy and immunology 10.64898/2026.05.26.26353818 medRxiv

Top 14%

0.0%

Show abstract

Patients with primary immunodeficiency (PID) often face prolonged diagnostic delays and may increasingly turn to large language models (LLMs) to interpret their symptoms during this period. We evaluated whether an LLM could recognize PID from symptom descriptions derived from interviews with 21 PID patients. In a prior study, we showed that GPT-4o identified PID in 96% of cases when prompted with physician-written patient histories (Rider et al., JACI, 2024). Here, when prompted with symptom descriptions in patients' own words, GPT-5 identified PID in only 7 cases (33%), although it more broadly suggested immune system issues in 18 cases (81%). The gap between these findings indicates that LLMs are sensitive to the language and framing of symptom descriptions, performing substantially worse when patients describe their own symptoms in everyday language than when clinicians summarize patient histories in structured medical terms. This study underscores the need to carefully evaluate how LLMs are used in patient-facing applications.

16

An ECG foundation model for generalizable cardiac function prediction across the lifespan

Yang, Y.; Peracchio, L.; Mayourian, J.; Miller, T.; La Cava, W.

2026-05-27 health informatics 10.64898/2026.05.26.26354128 medRxiv

Top 14%

0.0%

Show abstract

Background Artificial intelligence-enhanced electrocardiography (AI-ECG) enables scalable, low-cost cardiac dysfunction screening, but existing models are annotation-intensive and predominantly adult-derived, leaving paediatric generalizability uncertain. Paediatric cohorts exhibit highly variable cardiac morphology and function compared to adults, which may be useful for learning generalizable AI-ECG models. Methods We pretrained ECG-Fyler on a predominantly paediatric, all-age cohort at Boston Children's Hospital (1992-2023), annotated with a cardiology-specific coding system (Fyler codes), and evaluated it on assessments from echocardiography (echo) and cardiac magnetic resonance (CMR) studies. We validated on an external adult cohort from Columbia University Irving Medical Center. Performance was benchmarked against several AI-ECG foundation models by AUROC across age groups, lesion types, and limited-data scenarios. Findings The pretraining cohort comprised 782,138 ECGs from 255,271 patients (median age: 10.9 years, IQR: [2.8-16.8]). Internal evaluation included 178,495 ECG-echo pairs (median age: 10.9 [3.7-17.0]) and 8,584 ECG-CMR pairs (median age: 20.7 [15.6-29.6]). External validation included 82,543 ECG-echo pairs from adults (median age: 64.0 [52.0-74.0]). ECG-Fyler improved AUROC across biventricular dysfunction and dilation tasks, with the largest gains in low-data settings. In internal validation, ECG-Fyler detected low left ventricular ejection fraction (LVEF [≤] 40%) from only 100 fine-tuning samples (AUROC: 0.80, 95% CI: [0.78-0.80]), outperforming other models (AUROC < 0.65) and improving with additional fine-tuning (AUROC: 0.94 [0.93-0.94]). Similar improvements were observed for CMR-derived LVEF, RVEF, and ventricular dilation. In external validation on adults, ECG-Fyler exhibited an AUROC of 0.83 (CI: [0.82-0.85]) for LVEF [≤] 40%. After fine-tuning on less than 10% of external data, LVEF [≤] 45% performance (AUROC: 0.87 [0.86-0.88]) outperformed a fully trained, site-specific prior model (AUROC: 0.85 [0.84-0.87]). Interpretation Pretraining on richly annotated, paediatric-dominant ECGs yields models that transfer efficiently across institutions and ages, supporting AI-ECG screening and triage when labels or imaging access are limited. Funding National Institutes of Health (R01LM012973); Kostin Innovation Fund, Boston Children's Hospital

17

Patient Versus Prediction-Level Evaluation of a Dynamic Clinical Prediction Model of Sepsis

Tuttle, M.; Maas, C. C. H. M.; An, J.; Wessler, B. S.; Harvey, W. F.; Selker, H. P.; van Klaveren, D.; Kent, D. M.

2026-05-27 health systems and quality improvement 10.64898/2026.05.26.26354141 medRxiv

Top 14%

0.0%

Show abstract

The Epic Sepsis Model version 2 (ESMv2) is a prediction model embedded into the electronic medical record used to warn clinicians which hospitalized patients are at risk for sepsis. We conducted a retrospective cohort study of 31,951 hospitalizations of 25,760 patients to compare analyses conducted at the commonly used patient-level (where a maximum prediction prior to the onset of sepsis is used to measure performance) vs novel prediction-level (where each prediction is used to measure performance). Sepsis, defined by the Sepsis 3 criteria occurred during 1,049 hospitalizations (3.3%). Patient-level analyses suggested excellent discrimination AUC 0.86; [IQR 0.85, 0.87], whereas prediction-level analyses demonstrated lower performance AUC 0.62; [IQR 0.57, 0.65]. Low estimates of the positive predictive value (14.5% at the patient level vs 4% at the prediction level) imply a high number of false alerts. Common evaluation approaches may overstate the performance of dynamic prediction models and mislead clinical decision-making.

18

Morphological feature remodeling of intracranial arteries in the context of inflammation and HIV-associated cognitive impairment

Hoang, N.; Yang, H.; Uddin, M. N.; Zhong, J.; Faiyaz, A.; Singh, M. V.; Boodoo, Z. D.; Sutton, K. R.; Wang, H. Z.; Sahin, B.; Khan, M. W.; Weber, M. T.; Yuan, C.; Chen, L.; Schifitto, G.

2026-05-27 hiv aids 10.64898/2026.05.19.26353071 medRxiv

Top 14%

0.0%

Show abstract

Background: Despite the success of combination antiretroviral therapy (cART), vascular comorbidities, including cerebrovascular disease, are more prominent in people living with HIV (PLWH) compared to people without HIV (PWOH). However, quantitative assessments of cerebrovascular morphometry and their associations with cognitive outcomes in the context of HIV are still limited. In this study, we explore this missing link. Methods: Magnetic Resonance Angiography (MRA) data, blood markers, and neurocognitive assessments were collected from 73 PWOH subjects (male: 57, female: 16; age: 53 {+/-} 16) and 99 PLWH subjects (male: 66, female: 30, age: 53 {+/-} 11). Vessel morphometric features were quantified using intraCranial Artery Feature Extraction (iCafe) to investigate associations between vessel morphometry, markers of monocytes, endothelial cell activation, and cognitive performance. Results: HIV status predicted a lower total number of branches ({beta} = -0.224, p = 0.001, d = -0.517) and shorter total distal length ({beta} = -0.173, p = 0.021, d = -0.370) with a moderate effect size. Total branch number was found to be negatively associated with plasma levels of monocyte markers (sCD14: r = -0.167, p = 0.033; sCD163: r = -0.157, p = 0.045) and positively correlated with white matter cerebral blood flow (r = 0.550; p [≤] 0.05). HIV status was the strongest predictor of overall cognitive performance in ANCOVA model ({beta} = -0.219, p = 0.006, d = -0.453). Conclusions: Our results suggest that cognitive impairment in PLWH is associated with vessel morphology metrics. Monocyte immune activation may contribute to changes in vessel morphology.

19

Optical coherence tomography as a biomarker for frontotemporal dementia: a systematic review & meta-analysis

Wang, E.; Kohli, A.; Taha, H. B.

2026-05-27 neurology 10.64898/2026.05.19.26353366 medRxiv

Top 14%

0.0%

Show abstract

Background: Frontotemporal dementia (FTD) lacks widely accessible disease-specific biomarkers. Optical coherence tomography (OCT) and OCT angiography (OCTA) may provide non-invasive measures of retinal changes associated with neurodegeneration. We conducted a systematic review and meta-analysis evaluating retinal biomarkers in FTD compared with Alzheimer disease (AD) and controls. Methods: A systematic search of PubMed and Embase was conducted through April 25, 2026 according to PRISMA guidelines. Studies evaluating OCT/OCTA biomarkers in FTD with comparator groups were included. Inverse weighted random-effects models, publication bias assessments, and meta-regressions were performed. Results: Ten studies involving 139 individuals with FTD, 87 with AD, 29 with mild cognitive impairment, 14 with TDP-43 proteinopathy, 5 with tauopathy, and 255 controls were included in the systematic review; five studies were eligible for meta-analysis. Compared with AD, individuals with FTD demonstrated significantly thinner retinal nerve fiber layer (RNFL) thickness (SMD = -0.61, 95% CI -0.98, -0.24). Compared with controls, individuals with FTD exhibited significantly thinner ganglion cell layer-inner plexiform layer (GCL-IPL) thickness (SMD = -0.55, 95% CI -1.02, -0.08), whereas pooled analyses across multiple retinal biomarkers were non-significant (SMD = -0.19, 95% CI -0.52, 0.14). RNFL thickness correlated negatively with female % in FTD and positively with age in both AD and controls. Conclusions: Individuals with FTD exhibit lower RNFL thickness than AD and lower GCL-IPL thickness than controls, suggesting retinal alterations may reflect neurodegeneration. However, larger longitudinal studies with standardized OCT/OCTA protocols are needed to determine the diagnostic and prognostic utility of retinal biomarkers in FTD

20

ERBB4 deficiency promotes atrial myopathy underlying the atrial fibrillation substrate

Yamaguchi, N.; Santucci, J.; Hong, S. J.; Ferrena, A.; Schlamp, F.; Willett, D.; Casdin, C. J.; Park, P. S.; Lin, X.; Xiao, J.; Hall, S.; Barnard, J.; Achter, J.; Kanhert, K.; Lundby, A.; Chung, M. K.; Van Wagoner, D. R.; Park, D. S.

2026-05-27 cardiovascular medicine 10.64898/2026.05.26.26354173 medRxiv

Top 14%

0.0%

Show abstract

Background Atrial fibrillation (AF) is a leading cause of stroke, cardiovascular morbidity, and mortality. Atrial myopathy, characterized by progressive metabolic, electrical, and structural changes, creates the arrhythmogenic substrate that drives AF. Defining the key drivers of atrial myopathic processes is essential for targeted therapies that can mitigate AF progression. Here we explore how reduced ERBB4 expression contributes to the development of left atrial myopathy. Methods We analyzed the Cleveland Clinic Biobank to compare left atrial ERBB4 levels in patients grouped by AF diagnosis. To investigate the impact of reduced ERBB4 levels on atrial tissue substrate, we created mouse models of cardiac-specific Erbb4 deficiency using Mlc2a (myosin light chain 2a)-Cre. Comprehensive physiological assessments were performed. Transcriptomic analyses of the left atrium were performed in an Erbb4 haploinsufficient mouse model and compared with human atrial datasets. Molecular validation of key dysregulated pathways was performed. Results We found that left atrial ERBB4 levels are reduced in patients with AF. Adult cardiomyocyte-specific Erbb4 heterozygous (Erbb4fl/+;Mlc2a-Cre) mice exhibited prolonged P-wave duration in the absence of ventricular dysfunction. Left atrial transcriptomic analysis in Erbb4 haploinsufficient mice showed upregulation of pathways related to fibrosis, apoptosis, and coagulation, and downregulation of pathways related to fatty acid metabolism and mitochondrial function, mirroring changes observed in pressure overload mouse models. A cross-species transcriptomic comparison revealed significant overlap between ERBB4-correlated gene expression and functional pathways in adult human atria and mice with Erbb4 haploinsufficiency. Validating the transcriptomic data, protein and functional assays demonstrated increased fibrosis, apoptosis, and oxidative stress in the mutant left atrial tissue. Conclusion Left atrial ERBB4 levels are reduced in AF patients. A mouse model of Erbb4 deficiency and human atrial transcriptomic analyses highlight a role for ERBB4 in supporting normal atrial metabolism while protecting against inflammation, apoptosis, and fibrosis.